Skip to content

Conversation

@RogerPodacter
Copy link
Member

@RogerPodacter RogerPodacter commented Oct 30, 2025

Note

Enhances DataUri to recognize more base64 forms, percent-decode non-base64 payloads, support implicit data:, URIs, add JSON parsing helpers, and update ESIP6 detection with extensive tests.

  • Core (lib/facet_rails_common/data_uri.rb):
    • Base64 handling: Detects base64 anywhere in the header (case-insensitive); validates only when ;base64 extension is present; memoized strict decoding; base64?, claims_to_be_base64?, and data_valid_base64? added.
    • Decoding: Returns base64-decoded data when valid; otherwise percent-decodes via URI::DEFAULT_PARSER.unescape (preserves +).
    • Implicit form support: For data:,... defaults mimetype to text/plain, clears parameters/extension, and keeps data after the first comma intact.
    • API additions: Exposes mimetype, parameters (as array), extension, data; adds json? and parse_json helpers.
    • ESIP6 detection: esip6? reads raw parameters from the match to support legacy implicit forms.
  • Tests (test/data_uri_test.rb):
    • Comprehensive coverage for base64 variants (extension, params, MIME, case), invalid/empty payloads, implicit form, valid?, JSON helpers, ESIP6 legacy detection, and percent-decoding (spaces, HTML, SVG, plus sign).

Written by Cursor Bugbot for commit 5adc786. This will update automatically on new commits. Configure here.

@RogerPodacter RogerPodacter requested a review from Copilot October 30, 2025 18:13
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the DataUri class to handle base64 encoding detection more flexibly by checking the entire metadata section for "base64", not just the formal ;base64 extension. The refactoring also improves error handling for invalid base64 data.

Key changes:

  • Splits base64 detection into two concepts: claims_to_be_base64? (has ;base64 extension) vs metadata_contains_base64? (anywhere in metadata)
  • Invalid base64 data now returns the original data instead of raising an error, except when using the formal ;base64 extension
  • Adds memoization for base64 decoding to avoid repeated decode attempts

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
lib/facet_rails_common/data_uri.rb Refactors base64 detection and decoding logic with new helper methods and flexible metadata checking
test/data_uri_test.rb Adds comprehensive test coverage for various base64 detection scenarios and error cases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@RogerPodacter RogerPodacter requested a review from Copilot October 30, 2025 18:16
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@RogerPodacter RogerPodacter requested a review from Copilot October 30, 2025 18:22
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 107 to 110
header_end = match.begin(:data)
return false if header_end.nil?

uri[0...header_end].match?(/base64/i)
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex /base64/i could match 'base64' as part of unrelated words (e.g., 'mybase64file'). Consider using a word boundary or more specific pattern like /;base64/i or /[;=]base64/i to avoid false positives in MIME types or parameter values.

Suggested change
header_end = match.begin(:data)
return false if header_end.nil?
uri[0...header_end].match?(/base64/i)
# Check if the extension group is present and equals ';base64'
extension == ';base64'

Copilot uses AI. Check for mistakes.
- Added handling for implicit data URIs, defaulting mimetype to 'text/plain'.
- Introduced methods for JSON detection and parsing.
- Updated tests to cover new functionality and validate behavior for various data URIs.
…arsing

- Cache the result of JSON detection to improve performance.
- Change the default max nesting for JSON parsing from 200 to 100.
- Add a new test to verify behavior of implicit data URIs with commas.
Enhance DataUri class to support implicit data URIs and JSON parsing
"#{mimetype}#{parameters}"
end

def base64?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Mediatype Output Incorrectly Includes Parameters Array

The mediatype method concatenates mimetype with the @parameters array directly. Since @parameters is an array (set on line 35), string interpolation will produce incorrect output like "text/plain[\"foo=bar\", \"baz=qux\"]" instead of "text/plain;foo=bar;baz=qux". The parameters array should be joined with semicolons: "#{mimetype}#{parameters.map { |p| ";#{p}" }.join}" or similar.

Fix in Cursor Fix in Web

- Added :match attribute to the DataUri class for improved functionality.
- Removed unnecessary initialization of @match to nil in the constructor.
- Simplified the initialization of instance variables for mimetype, parameters, extension, and data.
- Consolidated logic for handling implicit data URIs to improve readability and maintainability.
- Added a new test to ensure proper error handling for legacy base64 extensions.
end
return unless claims_to_be_base64?

raise ArgumentError, 'malformed base64 content' unless data_valid_base64?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Base64 validation misses implicit claims.

The validation in validate_base64_content only triggers when claims_to_be_base64? returns true (checking for the ;base64 extension). For data:, URIs where "base64" appears in the data portion (like data:,text/plain;charset=utf-8;base64,invaliddata), the extension is nil, so claims_to_be_base64? returns false and validation doesn't happen. This causes the test test_implicit_form_with_legacy_base64_extension_raises to fail since it expects malformed base64 content to be detected and rejected even in this legacy format.

Fix in Cursor Fix in Web

- Introduced tests for implicit form with legacy base64 to ensure valid decoding and default behavior.
- Added validation tests for invalid base64 content and uppercase base64 parameters.
- Included handling for empty base64 content and cases with no mimetype, ensuring correct defaults and decoding.
- Updated the esip6? method to utilize legacy behavior for Ethscriptions.
- Added new tests to validate the handling of legacy implicit forms with the 'rule=esip6' parameter and other rules.
parameters.include?("rule=esip6")
# Use legacy behavior to support Ethscriptions
raw_parameters = String(data_uri.match[:parameters]).split(';')
raw_parameters.include?("rule=esip6")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: esip6? method fails for legacy implicit form URIs

The esip6? method accesses match[:parameters] directly to check for the rule=esip6 parameter. However, for implicit form URIs like data:,text/plain;rule=esip6,..., the regex captures everything after data:, as the data group, leaving match[:parameters] empty. This causes esip6? to return false even when rule=esip6 appears in the content, causing the test_esip6_legacy_implicit_form test to fail.

Fix in Cursor Fix in Web

- Modified the decoded_data method to utilize URI::DEFAULT_PARSER.unescape for proper decoding of data URIs.
- Added new tests to verify percent-decoding of spaces, plus signs, HTML content, and mixed content in data URIs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants